Prediction on Titanic dataset - this problem is about predicting the survival of passengers on Titanic, using data analysis.
Go to
http://54.179.140.163/contest/titanic-practice-problem# and create an account for yourselvesSee the video on the siteDraw hypotheses on what are some of the factors which could lead to less or more survival rate on the Titanic. Also discuss why these might be possible factors? Examples:GenderCabin ClassPort of boardingAgeWhether people were traveling in groups or alone...Go back to the website, and download the data set files (Train File, Test File and Sample Submissions)Open the Train file - this is a file that has data on about 900 passengers with features (gender, cabin class etc) and whether they survived or not. We will use this data to "train" our knowledgeUsing Excel Filters or Pivot Tables, find out what % of males survived. What % of females? What do you observe?If you had to predict whether a person will survive or not, just by looking at their gender, what rule would you follow?Find out if the survival rate was different across 1st, 2nd and 3rd classIf you had to predict whether a person will survive or not, just by looking at their cabin class, what rule would you follow?Now, find the chances of survival by combining above factors, i.e. for each of the following - (Male, 1st Class), (Male, 2nd Class), (Male, 3rd Class), (Female, 1st Class), (Female, 2nd Class), (Female, 3rd Class)What are the observations?If you had to predict whether a person will survive or not, looking at both their gender and cabin class, what rule will you follow?Do you think this predictor rule is better or worse than the one based only on gender or only on cabin class?Testing the Predictor - Open the "Test" file. This file has feature data on passengers, but does not specify whether each of these people survived or not. Create a column "Survived" and mark every entry under that 0 or 1, depending on whether as per your predictor, that person would have survived or not.Create a "Submission" file. A template of the file is enclosed. To create a file for submission, create a new file. Now from the Testing file, copy the columns labeled ID and Survived (only those two columns), and paste them onto the new file in first two columns. Save the file as CSV (comma separated) format. On the website mentioned, upload the submission file. Browse and add the same file you have created for both "Code File" and "Solution File". You may put any note in the description. This will show you how accurate the prediction was. You can test different prediction criteria by creating different submission files.